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ABSTBACT 

A need exists for the National Institute of Education 
(NIE) to extend the range of its concern with measurement into a 
number of new areas. While the measurement of basic cognitive 
abilities is vell*advanced, accurate measures of affective and 
higher-order cognitive abilities are not generally available. 
Measurement could also be extended into other dimensions as uell; 
specifically, the advancement of the ability to measure systems; the 
development of the measurement sub-disciplines of sociology and 
political science; improvement of unobtrusive data collection methods 
such as observation; better support for the research and development 
coBmunity; detection and measurement of unplanned consequences of 
educational programs; identification of inputs, contexts, and 
processes related to educational outcomes; emphasizing the importance 
of theory in deciding what needs to be measured. The author presents 
tentative recommendations for initiatives into the newly-defined 
areas of educational measurement. (N£) 
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PROBLEMS OF MEASUREMENT AND THE NIE PROGRAM 



Introduction 

Background 

Attention to problems of me^^auroment has been a salient concern 
since the first thinking about the National Institute of Education began. 
Indeed » the President's message on educational reform* which first 
placed the formation of NIE on the government agenda* highlighted the 
need "to develop broader and more sensitive measurements of learning 
than we now have" (Nixon » 1970) • This need was placed In the context 
of the need for accountability of schools and teachers so that our 
educational Institutions might be more responsive to local requirements. 

The establishment of an NIE Planning Unit inaugurated an extensive 

planning process. Prominent individuals and groups of experts prepared a 

wide assortment of papers » some focused on the contributions which 

various disciplines might make to the study of education* some focused 

on specific educational problems* and some providing syntheses of specific 

recommendations (NIE* OPI 1973)^ An analysis of these papers revealed 

that the need for new measures in education was a common theme running 

through many of them (Kooi* 1972) • 

Writers of the NIE planning documents agreed that new 
measurement procedures could be the basis for changes in the 

" present— structure-of~education-and-allocatlon-of-^esources_^ . 

within it* or measures could provide new bases for credentlaling 
so that current educational requirements could become more 
flexible. However* a program of exploration and development 
would be needed to realize this potential. Though there are 
some widely used tests that might adequately assess proficiency 

O 
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In readings mathematics » and the sciences » there are virtually 
no generally acceptable Instruments for assessing complex 
problem-solving skills and social -emotional behavior. For 
KIE to sponsor development of even rough milestone measures 
of learning In these domains would represent a vital and 
useful beginning. The purpose of this NIE Initiative Is 
to take the first step of examining educational measurement 
needs and designing a program to fill gaps In the area. 
During the coming year» the Institute should explore new 
techniques such as crlterlon*-referenced (or domain-referenced) 
tests which sample behaviors and skills In specific areas 
dltectly and do not attempt to compare the student with others 
nor to predict his future ability. Another promising direction — 
both for individual measures and for developing social indicators 
for learning situations—ties in the expansion of direct observa- 
tional methods. 

Before new techniques are expanded » however » the availability 
and sufficiency of measurements must be determined. Informa- 
tion is needed on what behavior should be tested » what tests 
are available » and how current measurements will work. When 
promising measures are identlfled» but validity* reliability* 
or standardization data are missing for them» this data should 
be collected* Such a study will identify gaps in traditional 
and new measurement so that a rational NIE program can be 
designed. 

The crucial need for the improvement of measurement in the disciplines 

underlying educational research has also been expressed. For example » the 

prominent sociologist/methodologist* Hubert M. Blalock notes that: 

•••certain kinds of inadequacies in measurement procedures 
may very well provide the major obstacle to be overcome if 
sociology is to mature in the direction of becoming a "hard" 
and disciplined social science. (Blalock* 1969) 

The Institute was actually established in 1972. The authorizing 

legislation lists four purposes for NIE: 

• help to solve or to alleviate the problems of* and proi&ote 
the-reform-and-renewal-of-American^education; 

• advance the practice of education* as an art* science* and 
profession; 

• strengthen the scientific and technological foundations of 
education; and 




3. 

• build an effective educational research and development 
system. (Education Amendments of 1972 » Title III» 
Sec. 405. (a)(2), p. 99.) 

The need for good measurement is basic to all these objectives » but 

perhaps It Is most convenient to think aho^t it In relation to the 

third and fourth objectives. Good measurii^nt Is part of virtually all 

educational processes » beginning with the teacher^ e need to assess 

the performance of her pupils and Including the assessment of teachers and 

schools » and making decisions and resource allocations at local » state and 

national levels. Because measurement Is so basic » It will be Inevitably 

a part of any program which NIE undertakes. One of the Issues which this 

paper must consider Is which measurement-^related activities are most 

appropriately organized on a focused » centralized basis and which are 

best handled within the context of specific programs « ,.1 

With the formal establishment of the Institute » new measures in 

education was recognized as the subject for continued program development 

work, first within the context of the New Initiatives Task Force and then 

as part of the Exploratory Studies Unit. A small conference was held in 

Princeton on October 2, 1972 » imder the sponsorship of the Educational 

Testing Service and the Center for the Study of Education.* 

*Conference participants were: Scarvia B. A\nderBon, Samuel Bail, 

Samuel^Messl^^^ Rosenthal » and E. Belvin Williams, ail cf ETS; 

Cornelius Butler and~War^~Mascrnr both" of-NIE; -Donald 

University of Chicago; Douglas Jackson of the University of Western 
Ontario; Silvan Tomkina of Rutgers University; Stephen Klein » 
Beverly Kooi, and Robert Pace of CSE* 
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Following the conference » two documents were prepared. Beverly 
Kool» a consultant to NIE» drafted a statement summarising the statements 
prepared by conference participants In eight problem areas thought of as 
a system of Interacting variables (Kool,, 1972): 

• Personal and social values and their educational 
ImpllcatlonB 

• Treatments as experienced by Individual learners 

• General environments In which learning takes place 
(including home» cooauaity^ and school) 

• Specific aspects of cogiEltTvWlntell^^^ development 

• Specific aspects of personal/social developoitent 

• Cognitive styles 

• Theory and methodology (evaluation and research 
design; methodology of measurement per se and of 
research design) 

• Costs (people and financial) 

Second 9 Mason outlined somie tentative program ideas for NIE derived from 
the conference results organized around two themes: (a) activities aimed 
at building the R&D infrastructure » and (b) activities aimed at collecting 
and analyzing data for use in policy research. (Mason^ 1972), 

It is the purpose of the present, paper (1) to provide a broad 
survey of Issues and problems in €{<duc&^idu and educational R&D Trhlch 
relate to measurement; (2) tc present an overview '^of current NIE 
activities which are relevant to these problems and lssueii» and (3) to 



present seme tentative recommendations for KIE initiatives. Tf-ie 
recommendations are tentative for several reasons. The scope of this 
field is so broad that it would be Impractical to present a thorough 
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analysis of each problem leading to a final recommendation; nor would 
any one individual be competent enough to make an equally credible 
presentation of every iasue* Further i it is important that the staff 
assigned to develop any given program have a central role In developing 
specific program plans* It is hoped that this paper will be able to 
identify some "places to start", and that appropriate organizational 
units or task forces can be formed to refine p elaborate or reject each 
recommendation, as may be most appropriate* 

What Needs to be Measured? 

Although much of the discussion of the n@®d for new measures in 
education has focused on the needs to measure pupil outcomes other than 
the usual cognitive skills, this is only part of the problem* Herriott 
and Muse make the useful distinction between variables at the individual 
level and those at the system level and note that such variables can 
serve as either independent or dependent variables (Herriott and Muse, 
1973) • A cross classification of these el<^ents produces the following 
typology: 



0) 
cd 
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Classif Icatory Schema Depicting Focus of the 
Independent and Dependent Variables in Studies of 
Educational Effects 

Independent Variable 





Individual 


System 


Individual 


A 


B 


System 


C 


D 
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They point out that most educational research traditions can be classified 
In one of the cella of this table. Thus much of the research In educational 
psychology seeks to relate the personal and behavioral characteristics of 
teachers to test scores of pupils (cell A); social psychology has fostered 
a line of Inquiry focusing on the liapact of Institutional factors on 
students » mostly at the college level (cell B) ; and economics has confined 
itself largely to the study of production functions of education - how 
educational resources Interact with student characteristics to produce 
variation In student behavior (cell D) • They point out t;ie limitations 
of each of these traditions and call for the development of more compre- 
hensive conceptual frameworks. 

A key point is that a given variable can play various roles » depend- 
ing on the problem and the analytic scheme. Thus a measure of stiident 
attitudes might be important both as an input and an output variable; if 
the same variable were aggregated by peer groups it might be a measure 
producing contextual effects.. Thus it is not possible to classify 
measures in terms of their analytic role; NIE needs to be concerned with 
the development of measures serving many analytic functions^ and not 
simply pupil outcomes . 

Who is the Client and What is the Purpose? 

It is a generally accepted principle that somewhat different kinds 

of-measures-have~to-be-constructed-for— dlffer^t-purposes. — Cronbach 

distinguishes (a) selection and classification of persons » (b) evaluation 
of treatments » and (c) checking on scientific hypotheses (Cronbach, 1970). 
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Identifying different clients or uoers also helps to Identify 
different purposes • Practitioners are the primary clients of the 
tustlng Industry. Traditional uses of teat information by teachers 
include diagnostic and prescriptive decialons regarding individuals and 
groups. Adsilnistrators use test information for making decisions 
regarding programs and allocation of resources. They rely on many other 
kinds of data as well. Student record files contain information on pupil 
achievement^ plus healthy family and other kinds of data. Schools and 

school systems also have elaborate t^cord keeping systems for. f lscal» . 

personnel » and other Infonsation which provide statistics for local » state » 

and federal use. Increasingly these various kinds of data are being 

used for program evaluation and as parts of management systems seeking 
to assure "accountability". 

The researcher generally has rather different purposes in mind. 
Primarily he is interested in ralatlonships among viirlables and Ifi making 
causal Inferences. Researchers can make a great contribution to the 
determination of the construct validity of measures by showing how they 
are part of systems of variables » and through studies of the population 
and ecological validity of measures » showing what variations in Interpre* 
tation follow from the use of given measures with different sub-groups 
and in different contexts. (Anderson » Messlck and Hartshome» 1972; 
Cronbach, 1971). 

The developer generally has purposes that overlap those of both 
practitioners and researchers. To the extent that the product to be 
developed Incorporates the use of tests or other measures » the developers 
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purposes coincide with those of the practitioner. However » the develop- 
ment procesa Itaelf requires the uaa of maaaiarte for purposes like those 
of researchers and evaluators. 

The special needs and perspectives of evaluators , policy makers , 
change agents , and others might also be detailed. NIE needs to concern 
Itself with this total range of cllentb and purposes, and not simply with 
the development of new tests for use In operating school systems . 

Measurement of Individuals 

New Learning Outcomes . 

The most common point of entry Into this problem area has been the 
observation that education has been focused on too narrow a range of cogni- 
tive outcomes and that measures should be davelo^ied for other kinds of 
objectives. This Is, of course. In the first Instance an argument concern- 
ing the goals of educe tlcm than measurement per se, but implicit 
Is the thought that we often pay more attention to things that have been 
quantified. For example, %he President's message on educational reform 
called for new measures of achievement: 

To achleve».. fundamentel reform. It will be necessary 

to develop broader Isind more sensitive ^ 
measurements of learning tlMfti^^^ now have. 

The national Institute of Education would take i 
the lead In developing these new measurements 

^of^edttcatiotalnoutp^ut^^ 

pay as nuch heed to what ii|re called "iifaMaeufa|>les" 
of schooling (largely because no one has yet learned 
to measure them) such es reiipo^islblllty, wit, and 
humanity as It does t6 and mathematics achievement. 

(Nixon, 19767 p- 3) 
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In a report prepared for the NIE planning Unlt» Etzlonl 
distinguishes betveeu inntrtunental end expreaeive goals of education 
end states that ve have tended to overearphesise the Instnmentel goals « 
He feels that this iobelance should be corrected and culls for the 
development of expressive tests '"(Etsioni, T$72) • ' 

Another planning report » by Kdol and Associates » provides a goal 
analysis structure as follows (Kool et« al.» 1972): 

A. Learning goals 

1. Social and em^^tional development 

a. Self*-acc^ptance 

b. Relating to jptbiers 

c. Responsibility 

d. Adaptability' 

2. Cognitive development 

3. Physical development 

B. Enebling goals 

C. Systems Goals 

1. Product iylty 

2. Access 

3 • Participation 

This sjchema is especially useful in making it clear that not all education- 
al goals can be reduced to learning goals. Each goal area implies need 
for measures to assess progress toward the goal« 
Levien also calls for: 

• ••development of techniques an<d Instruments for 
evaluating a far broader range of education results 
than are commonly consldered^r Among the requirements 
are: 
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» Methods for assessing paychbloglcal development, 
cognitive And motivatlcmal« • • 

• Methods for assessing learning outcomes referenced 
to objectives* • • 

• Methods for assessing social development... 

• Methods for assessing the development of learning 
skills and Incentives, 

Techniques should also be developed for Identifying and 
measuring some of the reasonably objective consequences 
of educational programs on society i, and some of the 
educational effects of outslde-^the«-school Influences-*^ 
family » friends » television (Levlen, 1971 » pp. 79-80). 

Krathwohl and Payne note that educational objectives for individuals 
can be stated at three or four levels of specificity. (Krathwohl and 
Payne » 1972»). At tihe most general level » there are many statements 
or objectives that htve been formulated by national commissions » pro- 
fessional groups » and prominent individuals. Such statements commonly 
give as much prominence to non-cognitive objectives as to cognitive. 
However > they note that in curriculum building efforts coittplex objectives 
are likely to drop out. 

This ero0lon-*of-effort is particularly likely to occur 
with affective objectives. The conceptual structure 
of nearly all new efforts at curriculum building 
Includes affective objectives in some Important way. But 
as the structure is developed » such objectlvea cease to 
Influence the direction of ln«(tructiou» the choice of 
activities^ or what students learn.^ As objectives to 
be achieved concomsiitantly with cognitive objectives » 
they. are not taught directly » and it is often merely 
hoped that they will be achieved with not concentrated 
effort on them... 



An additional important factor is that students will 
typically 8«ek to learu thoaa c^epacts of a course that 
will aarn t\xm a good gsrada^ affactiva objectivat 
raraly play any olgntflcai^t: part In grading. 
(Krathwohl and Payna» 1972 » pp» 35^36.) 

However ft there is some question concerning the degree to which 

one should esspect affective objectives to be reflected in and achieved 

through the explicit curriculum system. There are mas^y elements of 

social structure and process which in effect constitute cm implicit 

S 

curriculum having Important consequences^for the affective outcomes of 

education. We also need to note the importance of many other factors 

such as family values and community contexts in determining affective 

%tHtm9. The point is not to question the loqportaince of measuring 

affective outcomes but to question the apparent Assumptions that all such 

outcomes need to be represented in the explicit curriculum or that they 

are solely determined by school experiences. Given such mult i-*f actor 

determination ft if we are to measure affective outcomes we must avoid 

simplistic models which ascribe to the schools the sole responsibility 

for determining such outcomes. 

The determination of what new le^.mlng outcomes need to be 

merisured ±b, of course » partly a matter of the selection of goals and 

objectives ft and is thus a political process requiring input from many 

sectors. MXE is supporting several activities which contribute to 

this process. 

• The Center for the Study of Evaluation at UCLA has 
developed a needs assessment kit which provides a 
means for local schools W work with conmunlJ^ 
to identify and select goi&la for school jprograxns. 
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• The NIE Research Division Is conducting a series of 
surveys and laboratory studies to map the educational 
goal structure of the l&y public* 

• The Office of Research Grants is supporting a 
follov-up atudy of Project TALENT participants 
which will assess ths sfflcacy of past and present 
educational progrMfjdbiich^ o 

individuals to achieve their life goals and is 
expected to contribute to the forotulatlon of 
educational priorities and goals. 

However » not all needed outcome measures can be tied to pre--determlned 

objectives. Sociologists have long stressed the Importance of searching 

for unintended and unanticipated consequences of purposive social action 

(Clarke 1973) » and this point has been emphasized by Michael Scrlven 

in the context of educational evaluation (1972) » Clearly we need to be 

able to detect and assess effects whether or not the program designer 

planned them. Sensitivity to possible side-effects might come from 

use of a different disciplinary perspective* or from Insight bom of 

experience c 

Cronbach has expressed the dilemmas about whether what we 

can measure are the most Important things • and whether to emphasize the 

empirical or theoretical approach to Instnxment development: 

Only the strict empiricists* those who eschew theory 

as entanglement* have been marketing practical new 

products and procedures. I cannot escape the feeling 

that the things actuarially scored tests cannot do 

are more Important than the things they can do. Is the 

time not ripe for a wholly fresh effort to construct 

a new generation of tests? Or must testing babed on theory 

wait until theoretic and metatheorlc problems are 

better resolved? (Cronbach* 1970* p. xxvlll). 

From both the R&D perspective and thalE of priacticar use* concept- 
ualization is of great limportance^ i^ 



ERLC 



f 

13. 



There are several advantages. 

Measurement development ^nirsued as part of a theoretical 
framework Instead of on an ad hoc basis permits one 
to (a) evaltMite the adequacy of the measurement In 
terms of the meaning of the construct » (b) consider in-* 
dividual score differences as representing more or less 
the trait measured » and (c) compare and integrate results 
across studies in terms of cosmion constructs. 

If ve eventually want to use measurement for practical 
purposes such as diagnosis and evaluation » ve must be 
prepared to justify that use In terms of tho social 
consequences » and these cannot be evaluated without 
information about the meaning of the meanure. No 
accumulation of sterile statistics can compensate 
for lack of understanding. (Anderson^ Messick^ and 
Hartshome» 1972 » p. 2). 

It is not possible within the confines of a paper like this 

to come up with a specific list of variables for which new measureri 

are needed. Vhat we urge is that program managers and evaluators 

throughout HIE become sensitissed to the need to consider a much broader 

range of human abilities (as both inputs and outputs). This is already 

going on TiTlthin a number of programs. . However » there is a problem iti 

that these efforts tend to go on in Isolation from one another; for 

example » there is a lack of compatibility among the measures used for 

the evaluation of different Career Education models and among those used 

by the several evaluation contractors of the Experimental Schools Program. 

In the final section of this paper an agency--vlde task force is 

recommended which would help to identify common needs for new measures 

among programs and coordinate measurement development activities. Not 

only would possible redundancy be avoided » but an important contribution 

would be made to forming bridges of comparability among programs. One 
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of the most Important barriers to the cumulation of knowledge In 
educational research has bean the lack of agreed upou coiiKQ&>n measures 
of educational phenomena. To the extent that NOC can provide^ leadership 
in Identifying and developing nev measures of vide use and credibility, 
it will have taken a major step toward iaprovlng the cumulative 
character of the knowledge bas^. 

While current probleah-oriented programs are providing some support 
for measurement development » it is the nature of the case that these efforts 
tend to be short range and program dependent. Furthermore, it is 
difficult to put aside sufficient program money for measurement development 
when the thrust of events is to "get the job done". As part of the matrix 
management scheme proposed below it is therefore recommended that the 
agency-wide Task Force on Measurement have funds at its disposal with 
which to support the development of new measures which are expected to 
be of wide applicability in research and/or practice. 

RECOMMENDATION: 

• The NIE budget should set aside $300,000 in FY 74 
and $1,000,000 In FY 75 for development of new 
measures of vide applicability in research and/or 
practice. These funds should be under the control 
of the agency-^de Task Force on Measurement and 
would be supplemental to fxrnds used by individual 
programs to develop program-related measures. 
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So far the discussion has been confined to the measurement of 
pupil outcomes. In the last fawr yeard the ncasureoant of teacher 
competencies has achieved considerable iaportance with the passage of 
legislation in several states requiring that teachers be evaluated on 
their compecencies (Popham» 1972). Although the history of research ou 
**teacher effectiveness" is longi its results have been meager* Thci new 
legislation found the field quite unprepared with regard to the 
availability of a suitable array of teacher measures. 

The Office of Research and Brploratory Studies has a Task Force' 
on Education Personnel. The role of the teacher is of crucial importance 
in any educational program, and the vork of ^hls tmit has the potential 
of considerable Impact on other activities within NIE. Improvement In 
the conceptualisation of teacher functions and their measurement should 
play a central part in the work of this unit. 
RECOMMENDATION; 

• The Task Force on Education person^^ give 
a prominent place in its program to the develop^aent 
of measures of teacher competencies and activities 
as needed for new teacher accountability regulations and 
and for the implementation of Innovative educational 
programs. 
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Availability and Quality. 

Some empirical data have been published on the relative laiportance 
of different educational goala and on the availability of teats for the 
different goals. In its continuing program of evaluation technologies 
the UCLA Center for the Study of Evaluation obtained data from a national 
sample of 2»555 elementary school principals » teachers » and parents 
on their ratings of 106 educational objectives. Although most of the 
objectives listed referred to cognitive skills and knowledge In a 
variety of subject areas » the ten top-rated goals were mostly non- 
cognitive. 

Top Ten Goals for Elementary Education Derived from Ratings 

of a National Sample of Principals » Teachers and Parents » 

and Availability of Published Tests for these Goals In 1970-71, 



Rank 


Goal 


No. of 
Tests* 


1 


Sel£~E8teen 


5 


2 


Citizenship 


0 


3 


Soclallzatlon-Reballlousness 


11 


A 


Meed Achievement 


1 


5 


School Orientation 


9 


6 


Neurotldsm-Adjustnent 


30 


7 


Listening Reaction and Response 


15 


8 


Attitude Toward Readln|[ 


0 


9 


Silent Reading Efficiency 


21 


10 


Dependence-Independence 


16 


Source 


; Hoepfner, Bradley, Klein, and Alkln, 
and Hoepfner, in press. 


1972. p. 24; 



The availability of tests is very uneven for both cognitive and 
affective objectives. For all 106 goals, the correlation between the 
rating of importance and the number of tests available was only +.27 ^ and 
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foany goals h&d no teats at all. (Eoapfnari In press.) 

Tha availability of a tast and its quality are quite separable. 

The najor sources of Information about teat quality are the Buros 

Handbooks and the Test Evaluationa published by the Center for the 

Study of Evaluation (the latter vlth NIB support) » although some of the 

other compilations "cited in thelblbliograpliy have such Infonaation. Both 

Buros and CSE state that many tests are of relatively low quality. 

Test publishers continue Jtojaaricet testa which 
do not begin to meet the standards of the rank 
and file of ( Mental MeasurraMmt^ Yearbook) and 
journal reviewers « At leKSt half of the tests 
currently on the market should never have been 
published. Exaggerated^ falaei^ or unsubstantiated 
claims are the rule rather than the exception. 
Tast users are becoming more discriminating » but 
not nearly £ast enough. (Buros » 1972 » pp. xxvi- 
xxvlii) 

And CSE^ commenting on Its evaluation of tests of higher order cognitive 

affective » and interpersoxul skills: 

In concluslqnB It should_ be not in the opinion 

of the CSE staff .> the "state oi the 'art." as it is 
presented here» leaves much to be desired. In terms of 
quantity » of the 429 categories In the three classification 
schemes » 183 (43Z) are empty » and an additional 179 (42%) 
contain 10 or fever Instruments. In addition » the quality 
of the Instruments I as expressed by their VENTURE evaluations » 
is predominately poor to fair*. ..The average ratings for 
validity » normed excellence » teaching feedback » and 
retest potential are unifonnly poor^ while the ratings 
for examinee appropriateness and usability are predominately 
fair» with good ratings on these two criteria cccurlng 
most frequently In the Interpersonal domain and least 
frequently In the higher-order cognitive domain. la short* 
much work remains to be done* both in developing Instruments 
where none now exist » and in Improving the quality of those 
Instruments which have already been developed. (Hoepfner 
et. al. » 1972 » p. 24) 
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Several other coxsponents of the "infrastructure" of tests and 
v^^aSkurements should aumtionsd (in addition to the Buroa Bandbebks 
and CSE Test Evaluations) • A nunber of conpllatlons of neasuros of 
classes of variables or specific variables have been published; these 
have been starred (*) in the bibliography. The Educational Testing 
Service maintains a library collection of published^tests and publishes 
the Test Collection Bulletin . It should be not^d that the needs for 
instruments for school use and~f9? R&D use are not met eqxially veil. 
There is a considerable market for standardisd^;^ ^ests in the schools » 
and the '* testing industry" makes them readily available* along with 
scoring services. However » the researcher tends to be concerned with 
a much broader range of variables than the practitioner » and very often 
even when an appropriate measure has bean developed it has not been 
published and is not available In quantity. 

In addition HIE supports an ERIC Claarlnghouse on Tests. Measurement » 
and Evalf^tion at ETS which not only provides input to the ERIC system 
but also commissions ''information analysis products"* A number of 
professional organizations give^prdmineu^ attention to the i^adurement 
field 9 including the American Psychological Association (especially 
Division 5)» the American Educational Research Association (especially 
Division D)» and the National Council on Maaaurenent in Education. 

Despite these many services » activities and orRanizations^ it is 
fair to say that, for one reason or another » many researchers and 
practitioners still experience great difficulty in locating instruments 
of specific ^Imract eristics for given purposes which have bean properly 
O 
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reviewed and evaluated. This felt need resxil ted In the formation of 
the Inter-Assoclatlon Council on test Reviewing (lACTR) in 1968 
(Payne and Watkina, 1973). The Council did a study which did much 
to identify problems and propose solutions. Unfortunately the 
organization lacked a firm institutional base and the necessary financial 
backing and was dissolved in 1972. 

Inforsiation about the quality of treasures is needed by various 
clients for various purposes. The lAC^ experience should be examined 
carefully to determine whether NIE should sponsor an activity to meet 
the needs ide^itified by that group. This field might be a prime candidate 
for establishment of a new institution. None of the laboratories or 
centers in which NIE supports prograstt have a major focus on measurement. 
The closest is the Center for the Study of Svaluatlon at UCLA, but it 
deals with evaluation rather than measurement and does not deal with the 
full range of measures used in educationlil research and practice. The 
Buros Handbooks are a personal project of the editor-publisher who is 
of retirement age and thus lack any Institutional base; whether the 
series will be continued is problematic. In addition there is a 
good deal of current discussion of the need for item banks and related 
services. This could be another f\mctlon of a new institution. 

Problems of access to informatloh about tests and measurements 
exist within NIE as well as in the field generally. An attempt has 
been made to order the key reference volumes for the NTS Hbraty» and 
the Educational Reference Center provides seart^h nmS retrieval services. 
However 9 especially with a growing intra-^ral research program* these 
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general services may need to be supplemeuted by somewhat more specialized 
activities. It Is proposed that an Information specialist In measurement ^ 
be added to the staff of the newly fomed Educational Hefereace Division 
who can assist NIE personnel in obtaining Information about tests » 
research Instruments » and specialized collections found elsewhere, such 
as the ETS collection of published testa » data tapes » items banks » etc. 

NIE la beginning the design of a series of periodic and special 
studies of the R&D system. One element of this program should be the 
examination of the resources and services available to researchers and 
practitioners for the Imprpvei&ent of measurement. 

Some problems have been noted within NIE in the rigor and 
consistency with which standards regarding instrrmentatlon have been applied 
in the review of proposals and the monitoring of projects (Beezer« no 
date). In the past^ some activities have been supported which were not 
sufficiently rigorous from the measurement perspective. The forms 
clearance procedure has been concerned primarily with issues of 
respondent burden and Invasion of privj^cy^ ^ot technlcs^I adequacy. 
Proposals focusing on !Si:&A&u^ement issues have been reviewed very 
carefully ^^th respect to instrumentation! but often proposals with 
more substantive foci have been approved even though they provide 
almost no Information on instrumentation. An NIE consultant » John 
Tuckey^ has suggested a system of "circuit' riders" who might provide 
consultative services to principal investigators needing such assistance. 
This might be helpful » but we also need to introduce more rigor in NIB 
procedures before proposals are funded* It is proposed that the 
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agency-wide Task Force on Heaeurement examine existing pxocedure^ for 

proposal and RFP review In order to strengthen the^e standards. 

To summarize recommendations concetnlng the availability and 

quality of measurement Instruments; 

RECOMMENDATIONS ; In support of Its mission to 
strengthen the scientific and technological 
foundations of education and build an effective 
educational research and development system^ 
NIE should support the following Infrastructure 
building activities in the agency and In the 
field li 

< All Information specialist in measurement should be 
added to the staff of the Educational Reference 
Division to assist staff in obtaining information 
about tests I research instruments and data 
sources. 

• An instrumentality should be created and supported 
fo)r expanding and improving the review and evaluation 
of measurement Instruments » including measures of 
noiii^cognltlve abilities and variables of primary 
Interest to the R&D community. 

• An instnimentallty should be created and supported 

vhlch would putlis!^ ot ofM.T^B^^ make available 
Instxt^m^ts in the public it?s^?Lin or ander liicm^o 
which meet standards of quality and need but rot^ 
which the market is too thin to invite commercial 
publication. 

• The program of research on the R&D system should 
include a study of the Infrastructure supporting 
the measurement needs of various agents in the R&D 
system and make recommendations for meeting other 

unmet needs through the establishment of new institutions 
and/or by other means. 

• NIE should revise its procedures for review of 
proposals » RFP's and forms "^to Involve experts 
on instrumentation and methodology to assure . 
a higher level of technical quality in the 
research and development supported by NIE in its 
Intramural and extraisaral programs . 
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Individual Effects of Testing and Problems of Bias > 

The use of tests has grown r«pldly alnco the turn of the century, 
particularly in the public schools during the last 15 years (Kirkland, 
1971). Between 150 million and 250 million tests a year are glven^ 
or three to five standardised teats per pupil per year. In addition there 
are the external testing programs such as the College Entrance Examination, 
the National Merit Scholarship » and the American College Testing Program; 
and the use of tests by industry ^ business » government » and the military 
establishment. 

Despite this apparent success » testing has increasingly become the 
object of criticism. These criticisms have eminated from various 
sectors 9 including school administrators (Joint Committee on Testing » 
1962), and Blacks and other minority groups 1(;r. Williams, 1970)7 Iseneraily, 
these criticisms can be divided into three groups: (1) scientific Issues 
concerning the validity of tests; (2) "^professional issuies concerning 
the misuse of tests; and (3) social issues concerning the consequences of 
testing. 

With reference to validity > Uessick and Anderson note that the 
lower scores typically obtained by minority and disadvantaged individuals 
may be traced to three possible sources (Messick and Anderson » 1970): 

1. The test may measure different things for different 
groups • 

2. The test may Involve irrelevant difficulty 

(a) Items that are more germane to one group 
than to another. 
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(b) Testing conditions that make some Individuals 
feel anxious » threatened » or alienated. 

(c) Differences in test .wlseness. 

3. The test may accurately reflect ability or achievement 
levels. 

Discussions of cultural bias in tests have emphasised one or another 
of these factors. Some have gone as far as to propose a moratorium on 
testing* (R. Williams, 1970) • Others have proposed approaches to the 
elimination of specific sources of bias. There have been various 
attempts to develop "culture fair" testa of intelligence. Soma have 
translated tests into the primary language of bilingual populations. 
And others have tried to modify test administration procedures in order 
to eliminate some kinds of irrelevant difficulty. None of these efforts 
have been fully satisfactory » and thus a major problem remains with 
respect to educational programs for bilingual and other 8ub'**cultural 
groups. There are several programs within NIE for which the problem of 
bias should be a central issue (e.g. the task forces on bilingual educa** 
tion and the urban disadvantaged). However » it does not seem to be 
reflected in their plans as yet. NIE should organize a new task force 
composed of measurement specialists and representatives of relevant 
R&D programs to plan specific steps to deal with this problem area. 

Another set of issues revolves arotmd the misuse of tests. Tests 
must be used for the purposes for which they were designed and interpreted 
with reference to the design constraints • Professional standards exist 
for the development and use of educational and psychological tests (Joint 
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Committee on Revision of Standards^ no date)* 

One of the problems is that ''a test might have a different validity 
coefficient or a different regression function for a minority/poverty 
group than for a middle class group and that the general use of prediction 
equations derived from the White majority might unfairly penalize minority 
individuals in selection or placement situations (Messick and Anderson^ 
1970.). 

The National Assessment of Educational Progress (NAEP) has attempted 
to deal with this problem through the concept of "balancing" (Robert 
Larson et. al.» 1973). National Assessment reports its results in terms 
of groupings by age» r(agion» 8ex» size and type of community* color » and 
level of parental education. Balancing is an adjustment procedure designed 
to remove the masquerading of one group effect as another and to avoid 
"double counting" individuals. An Nli grant is supporting the further 
development of this method. In a similar vein» Mushkin has proposed the 
"SIR" (seXy income » race) adjusted index of educational achievement 
(Mushkln» no date). 

Other problexDS of misuse can be listed (Messick and Anderson » 
1970): 

• Relevance of the selected test for the proposed purpose 

• Side effects (e.g. is test-tfi^lng a pleasant or 
frustrating experience?) 

• Misinterpretation of test results (e.f. » the 
presumption that test scores reflect fixed levels 
of capaiity» or the tendency to take seriously 
insignificant differences between scores) . 
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• The problem of secondary u8e» or the use of test 
results obtained at one point In time for one set 
of purposes at another point in time for different^ 
purposes (raises issues of invasion of privacy » 
confidentiality of racords, and client welfare). 

Issuea with respect to the effects of taatlcgj on students parents, 

and teachers have been suionarlzed by Klrkland (1971)*; 

• Effects <;m students; What are the effects of tests on the 
motivation^ self-esteemV <md self^^erceptlons of 
students? Do they affect study habits and teach^ar-- 

pupil relationships? Do they produce anxiety and emotional 
tensions? Are pressures to achieve by teachers, parents, 
and schools made as a result of ce^sts? Do tests encourage 
dishonesty in the form of cheating, faking, etc.? Do they 
create labels of inferior or superior intellectual status? 
Do they determine one's adult social status? What advice 
is given students on the part of parents, teachers, and 
schools as a result of test scores? What is the Influence 
of tests on the opportunities open to individuals? 

• Effects on parents : What are the effects of testa on parents? 
Do their children's test experiences produce tension and 
anxiety in them? Does the importance that tests have in 
selection and placement cause parents to inflict undue 
pressures on their children? Does knowledge gained from 
their scores Influence parents* perceptions of their 
children's abilities? Does this kncwledge. influence the 
advice parents give to their children? 

• Effects on teachers : Are pressures placed upon teachers 
as a result of tests? Do tests determine teaching and 
evaluation methods? Are teachers evaluated by these tests? 
Bo they color the teacl;bir8.'Ipj^ceptlons of students? As 

a result of tests, do teachers behave differently toward 
students? 

In many respects the questions about the effects of testing on the 
life chances of individuals are among the most serious raised. It is 
charged that tests may pradlcv^ the aMlity to do well in school, but 



^Systems effects of testing are discussed in a later section. 
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neither test results nor school grades predict success In occupations 

(Jencks» 1972 » Berg, 7 970). This la as touch a crltlclsa of schooling 

as It Is of testing^ although lnduatr5^1 use of tasting Is not notably 

more successful. Of course It is not the sole function of the schools 

to prepare people for jobs. But va need to be able to define the 

knowledge and skills of a competent adult in his various roles and to 

be able to determine whether the schools are making their proper con*- 

trlbutlon toward education for adulthood. (Mobility issues are discussed 

further in the section on measurement of systema.) 

Granted that there are a nusnber of problvnai associated with the 

use of testing, there would also be social consequences of not testing, 

as Messick and Anderaon point out (Messlck and Anderson^ 1970) . The risk 

is that subjective forms of appraisal would be substituted with the 

likelihood that bias and discrimination would Increase. 

The elimination of tests would also mean the loss 
of ona of the best ways for teachers to acquire a 
useful appreciation of the broad range of competencies 
and traits that characterize human behavior or to 
develop needed sensitivities to the nuances of 
cognitive growth.* An Increased parochialism might 
spread throughout education because of the absence 
of a national normative per«pactlve and the liminatlon 
of access to concrete examples jd£^ what other educators 
deem Important to asseas. And of utmost importance. 



*A reviewer of an earlier draft of this paper takes issue with 

this point, feeling that the use of test tends to narrow the sensitivities 

of teachers. The difference may be between the potential use of tests 

and what happens more typically. This Issue would be worth investigation 

empirically. 

O 
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there would be an absence of yardsticks for gauging the 
effectiveness of educational programa and for evaluating 
the equity of the educational syatea. (Messlck and 
Anderion» 1970» p. 87). 

With reference to the entire range of problems Identified under 

the heading of "effects of tests and the problem of bias/' a number of 

current activities are %rorth noting. 

• A revision of the "Standards for Development and Use of 
Educational and Psychological Tests'* Is now In the fourth 
draft of a revision under the sponsorship of three profession- 
al associations.* 

. In the spring of 1973 a National Workshop on Testing in 
Education and Employment was organized to focus on the 
need for reform in procedures for testing racial » ethnic » 
and low socioeconomic groups in America. 

• NIE made s grant in June 1973 to support » in coopers-* 
tlon vith three foundations » a project deslKned to study 
the effects of testing in Ireland. Hitherto Ireland has 
not used standardised tests in its schools. A decision has 
now been made to introduce testing^ asid the project repre- 
sents an agreed-xxpon plan tta do so under an experimental 
design. The two main foci of^ the research are (1) to 
study the conse^quences of introducing^^ individual 
institutional » end cultural levels » and (2) to do a case 
study of the research as an instance of planned social 
experimentation . 

• NIE has a legislative mandate to build an effective 
research and development s^f'stem^ and the Planning 
and Policy Analysis Unit of the Office of Research 
and Development Resources 1^ underti^king policy 
studies to determine how best to fulfill that 
mandate^ Testing and the testing Industry are part 
of that system and will be Included in a survey of 
the R & 0 system now being designed. 



*The Joint Committee on Revision of _ Standards includes representation 
from the AFA Committee on Psychological Tests » the APA Board of 
Scientific Affairs Liaison » the American Educational Research 
Association I and the National Council on Measurement in Education. 
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RECOMMENDATION: 

• An Explcratory Studies grcmp» working through 
the agency-vide Task Fbree on Miiaiureaent should 
give high priority to a reaaarch program on the 
effects of taatlng and the problem of blaa» vorklng 
vith the Task Force on Bilingual Education » the Task 
Force on Education for the TTrban Dlaadvantaged» and othef 
relevant unite. 

Theoretical and Methodological Issues 

Any detailed treatment of theoretical and methodological Issues 
tends to become quite technical and is probably beyond the coo^etence of 
the present author* There have been a number of recent statements 
suanarizing the state-of*the*art and pinpointing areas vhere new vork 
is needed (Cronbach» 1970; Thomdlke» 1971; McClelland^ 1973; Ebel» 1973; 
Kjf.rkland» 1971; Krants» at. al«» 1972; A^derson^ Messlck» and Eartshome» 
1972). Certainly the field la in feiMUt^ both in education specifically 
and the underlying behavioral aciencea gener^iy. Here ve will attempt 
only a brief listing of soma of the salient problem areas. 

In the last ten years the concepts of criterion referenced testing 
(Glaaer» 1963; Popham and Rusek» 1969) » domaln^referenced meaaures 
(Hlvely» et. al.» 1973) » and mastery learning (Block» 1973) hav€ emerged . 
The literature on these topics is still rather confused » and their velue 
for the improvement of education and educational research haa yet to be 
determined. Nevertheless » that potential is sufficiently challenging to 
warrant NIE support of continued work in these fields. 

The use of standardised tests developed to measure individual 
differences for the purpose of evaluating educational programs has become 
a controversial area. Fennessey has revlew^^ these issues and suggests 
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that their use may be quite appropriate under certain conditions 
(Fc:nnes8ey» 1973). 

There has been an increasing dlsaatisfaction with cross-sectlonal 
research dasigns and a growing interest in longitudinal research . This 
requires the development of new Aetbodologles appropriate to the measure** 
ment of change. Several activities in this area should be noted. Two 
federal Interagency committees, one on early childhood and one on 
adolescence, have supported a special Interest group on longitudinal 
research. The group has identified some of the problems of longitudinal/ 
intervention research and compiled information on important studies now 
underway. (Grotberg and Searcy, 1972; Grotberg, 1972; Laser, 1972). 
They are now holding discussions concemlnft the possible use of **marker 
variables,** i.e., agreed upon measures of key variables vhlch would be 
used in all related projects (regardless of wh^ii other measures were used) 
so as to provide a link between similar studies a^ promote the cumulation 
of knowledge. Trent and his associates Ivsvo also compiled information 
about longitudinal studies and done an analytic comparison of their 
conceptual frameworks, methodologies and findings. (Trent et. al., 
1972-73, 5 Vols.) Finally, the Zmrd on Human Resources of the National 
Reseevch Council has been examining and comparing various data sets 
available from projects conducting longitudinal studies and from pro- 
fessional associations which do studies of their membership. 

This section on measurement of individuals perhaps has focused too 
much on the use of tests and the concerns of psychometriclans. There 
are other methods of collecting information, ^although flome may be more 
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relevant to the research worker than to the practitioner. Observational 
methods both in the field and the laboratory are important tools of data 
collection^ and many categorical schaaaa have been devised for classifying 
behavior and interactions (Simon and Boyer» 1967 and 1970), The field 
notes and participant observation of the anthropologist and the sociologist 
need to be considered. Despite various problems and criticisms » the 
survey is still a vldely used research tool. The recently established 
Social Science Research Council Center for Social Indicators is now 
attempting to achieve consensus on the wording of a set of "backgrotmd 
variables" such as education » occupation and marital status in order to 
improve the comparability of data among surveys. The logic and methods 
of survey analysis have been improved and refined over th^a past twenty 
years. The interview providas great richness of detail and depth of 
meaning y perhaps at the expense of comparability » but at certain stages . 
of research such data can be the source of great insight. The imaginative 
use of school records ^ financial accounts , and administrative statistics 
can provide valuable Information. Such data fall in an important class of 
unobtrusive measures (Webb» Campbell, Schwartz and Scchrest, 1966) 
which have the methodological advantage of being nonreactive, i.e., they 
do not tend to modify the behavior of the person being studied. On the 
other hand there are problems for which physiological measur>2s may be 
quite appropriate. 

There is a considerable literature about each of these methods, and 
a separate paper could be written about the advantages and problems of 
each. For the moment we will have to content ourselves with the admonition 
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to NIE program managers to choose the method to fit the problem and to 
recruit staff members from an appropriate range of disciplines and 
methodological traditions. 

In Ti 73 NIE supported a progracm then called Field Initiated 
Studies In which one of the panels focused on ^'objectives, measurement, 
and evaluation". The total program provided $10,285,000 (comsiitsients for 
FY 73) in support of 193 projects. Of this, $917, A92 vent to 29 projects 
concerned with objectives, measurement and evaluatloti. 

Under FY 74 plana for support of field Initiated research, to be 
administered by the Office of Research Grants, consldaration is being 
given to dissolving the Panel on Objectives, Measurement and Evaluation 
and instead assigning proposals in the this field to other panela. From 
the perspective of this paper such a atep vould be unfortunate. A 
aaparate program area on the theory and methodology for educational 
measurement is needed because panels reviewing proposals on substantive 
problems concentrate on those problems as such. They tend to be satisfied 
with current methods, even methods vlth known llmita'tlons, rather than 
insist on confronting and resolving methodological difficulties. Further, 
the support of field initiated research should be considered a key 
strategy for support of theoretical and methodological problems. It is 
a relatively non*-mlssion oriented aspect of educational R&D and one in 
which maximum freedom for the investigator is generally viewed as being 
most productive. 
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RECOMKENDATIOKS: 

• The Office of Rtsearch Gruts ehoxild maintain the 
Identity of the Panel on Objectives » Htaeureaent» 
and Evaluation and should provlds leadership 
through conferences and other activities in i&aklng 
NIE's Interest In this flald Icnovn to the research 
community and otherwise stimulating a larger 
number of high quality proposals. 

« The Task Force on Measuremeni; uhould undertake or 
support relevsnt instrus^tation studies when the 
need for specific research is identified in connection 
with mission-oriented programs. This should include 
research on instmrnentatioti problems associated 
with loagitudlnal research and the measurefiieut of 
change* 
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Heasurement of Systems 

Educational systems are also Important units of analysis In the study 
and practice of education* The measurement of Individuals Is the province 
of psychometrlcs and Is relatively wel 1 developed. Concern for the 
functioning of systems Is a more recent phenomenon and the development of 
theory, Identification of the relevant variables* and the formulation of 
appropriate measures Is much less advanced. Here measurement special Istti^ 
and theorists in sociology* political science^ economlct:^ <3iducatIon and 
anthropology have much to contribute. 

One may be concerned wttrii l^e functioning of educational systems at 
any of several levels. Although the classroom level Is often thought of 
as the lowest level of analysis* there are smaller units of some 
Importance: the peer group, teams of professionals and paraprofessionals; 
pupil teacher dyads* or other units. Above the classroom are the school* 
the school district* state* and nation* with Intermediate levels sometimes 
of interest. The existence of multiple levels means that the status of a 
given variable may change from problem to problem. For example* what Is 
a dependent variable In one problem may be a contextual variable In ^ 
another. 

Programs and Processes 

For reasons difficult to divine* little Is known about what goes on 
in America's classrooms. Perhaps the nature of major federal programs 
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has held something to do with this. Host have been based on a model under 
which resources were supplied to support unnamed and undescribed 
"innovations" to 4?/C deviate some educational problem. The innovation to be 
attempted was )eft to local choice and initiative. While guidelines were 
provided and there were criteria for rejecting projects^ the actual nature 
of the treatments supported covered a very considerable range. Often 
Innovations have existed largely at the label level » with no common 
understanding of what the specifications for the innovation were. Thus 
terms I lice ^'teacher centers''^ **open education'^ "team teaching**, 
"educational renewal**, and ''differentiated staffing*' have been little 
more than conceptual inkblots to soma, with each teacher* school » school 
district or federal official supplying his own meaning to these temis. 

EducattonaS developers^ such as those In the laboratoriesi have had 
to be more concerned with the nature of their treatments, for that Is 
what they were inventing. However » where their new products were tested 
in comparison with "traditional" practice the attempt to describe 
"traditional practice" in detail and its similarities and differences 
with the new product has often been laclcing. 

At the national level there Is little known about the nature of 
educational practice. In looking at the literature one sometimes gets the 
Impression that the schools are the same as th^y were 30 or 50 years ago, 
while at others it would appear that very substantial changes have taken 
place in a majority of schools. Perhaps both statements have some 
validity, but they refer to different aspects of practice. What are the 
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facts? A specific project that NIE might wish to consider would be a 
national sample survey of education practices. The feasibility of such a 
study would depend largely on the ability to develop suitable measures of 
educational programs and processes. 



What might bo the major facets of such a study? In describing what 
he calls the "means of education". Bruce Joyce differentiates three 
systems : (Joyce ^ 1 969) • 



A. The social system of the school 

1. The normative structure 

2. Student roles 

3. Teacher roles 

B. The technical support systems 

\. Data stpraqe and retrieval systems 

2. Instructional systems 

3. Information processing systems 
Materials creation and consultation systems 

C. Curriculum systems 

1. Content of subjects or curriculum areas 

2. Sequence 

3* Repetition of ideas, principles or values 
to provide continuity 

4. Teaching strategies 
5- Mode of presentation 

6. Assessment and feedbacl( systems 

Of course these systems and components interact with one another. The 
phenomena represent very different iiteasureinent problems, and the existence 
of appropriate measures is quite uneven. Price has assembled a compendium 
of operational measures of organizations (Price^ 1973). Dreeben has • 
opened up some of the conceptual and theoretical problems of the normative 
outcomes of schooling (Dreeben, 1968a and 1968b). He places schooling in 
the context of socialization and argues that some of the important norms 
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learned in school are a function of the socUiI system of the school rather 
than the explicit curriculum. Some specific measures of the normative 
structure are covered by Mason (1953)* Many aspects of educational 
practice can best be measured by observvtional methods. Research for 
Better Schools has assembled Information about a large number of Intern- 
action analysis schemes (Simon and Boyer» 1967 and 19/0) « Corwln has 
developed Guttman and LIkert scales and other indices to measure structural 
and group characteristics of schools. Including standardization ^ central I** 
zatlon of declslon*malclng» patterns of supervision, group coheslveness, 
and professional and employibte role conceptions of teachers. (Corwln » 
1970). Boocock and Cohen have each contributed to the conceptual I zatWxi 
of sociological variables at the school and classroom level as related to 
student learning (Boocock 1966 and 1973; Cohen, 1972). 

Educational researchers and policy makers have tended to confine 
fcheir attention to the formal school system* Within that framework, in 
terms of programs and processes there Is more knoMvi about^^ 
elementary and secondary schools than about post secondary education. But 
outside of the formal school system there are tremendous amounts of 
educational activity that take place In other settings: in employer 
operated programs (e.g., NIE*s Career Education Model II), in the armed 
services, in the home through correspondence, television i and open 
university programs, in evening schools, proprietary schools, etc. etc. 
Just as we must begin to understand schooling in relation to the total 
socialization process (i.e. all the processes which determine how the 
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young bcconve adults), we must be able to place '*estab1 ishm^nt schooling'* 
in relation to other explicitly educational Institutions In our society. 

The evaluation of educational programs has become a very Important 
type of inquiry In education. Some studies reflect a school of thought 
that focuses almost exclusively on outcomes. When many :;tud{es have 
Indicated program failure or partial failure, the question of %hy?" 
inevitably arises. To make such causal Inferences requires a different 
kind of evaluation design, often referred to a$ evaluation research 
(Suchman, 1S67; Rossi and Williams, 1972). Such work requires the 
conceptualization of different classes of variables and their Interrela- 
tionships, The identification of processes and programs becomes crucial In 
such designs. 

A frequent finding In evaluation studies Is that the Innovation or 
product was not actually implemented In the manner specified by the 
developer (Gross, 1971; Solomon et a1» 1973). Clearly it is not enough to 
use the developer's specifications as the measures of progran. and process; 
the researcher must get Into the classroom and determine what is actually 
happening. 

According to Suchman there arc two possible sources of program failure. 

If a program is unsuccessful. It may be because the program 
failed to 'operational ize* the theory, or because the theory 
itself was deficient. One may be highly successful In 
putting a program into operation but, if the theory Is Incorrect 
or not adequately translated Into action, the desired changes 
may not be forthcoming: I.e., "th* operation was a success but 
the patient died.'* Furthermore, In very few cases do action 
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cr service proi^rams directly attack the ultimate 
objective. Rather they attempt to change the 
intermediate prccess wUich Is 'causally' related 
to the ultimate objective. Thus» thcftre are two 
posslblo sources of failure (1) the inability of 
the program to Influence the 'causal! variable, 
or (2) the InvaUdlty of the theory Unking the 
'causal' variable to the desired objective. We 
may diagram these two types of failure as follows: 

INDEPENDENT INTERVENING DEPENDENT 

VAR I ABLE VAR I ABLE VAR I ABLE 



Act i V i ty ' Cau3a 1 ' Des i red 

or Process Effect 

Program 

Program Theory 
Failure Fa! lure 

According to this analysis » evaluative research 
tests the ability of a program to affect the 
Intervening 'causal' process. Non**6va1uative or 
basic research, in turn» tests the validity of 
the intervening 'causal' process as a determinant 
of the desired effect. (Suchman» 1971) 

Some investigators hold that it Is at the point of interaction 

between aptitude or trait and the treattment that great promise lies for 

Improving our understanding of the educational process (Cronbach» 1970). 

The general notion Is that there is no one "best** instructional program 

for all students; rather» characteristics of students (e.g. personality, 

ability or status variables) can be identified which exhibit differential 

relationships with characteristics of treatments (e.g. inductive vs. 

deductive or structured vs. unstructured). While a number of such 

interactions have been found» most have not yet been replicated, and there 

are many cases where hypotheses about Interactions were not confirmed 

(Berliner and Cahen» 1973). 
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Perhaps the person variables have been studied more carefully than 
the treatment variables. Such research cannot succeed unless the 
conceptualization and measurement of the treatments is equally 
sophisticated and rigorous* V/e need to identify the important dimensions 
that characterize educational treatments, develop a methodology for 
quantifying them, and determine their usefulness for comparative 
evaluation* 

For one thing, treatments cannot be reduced to curriculum materials 
and teacher behavior; the social organization of the school and classroom 
must also be understood* Concepts such as peer group^ school climate, 
role structure, compliance and control mechanisms, and type of grouping 
are among those of Importance* There Is increasing experimentation with 
the organizational aspects of education as witnessed by innovations in 
team teaching, differentiated staffing, and open education* However, 
much more needs to be known about the relation between group structure 
and process, on the one hand, and social psychological concequences in 
behavior on the other. Such research will require measures of qualitative 
relationship's within the learning group, over time* 

NIF. has sponsored work on the multi-unit school at the University of 
Wisconsin Research and Development Center for Cognitive Learning, as well 
as work on a variety of organization effects at the Center for Social 
Organization of Schools at Johns Hopkins University* Some of the 
research in this field has been reviewed by Boocock and Cohen (Boocock, 
1966 and 1973; Cohen, 1972). 
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• NiE should design and conduct a national survey 

of educational practice which would determine what 
educational materials, mothodt, organizations, and 
technologies are actually in use, , tdenti fy innovative 
or experimental programs, and determine the number of 
pupils and instructional staff Involved with different 
practices. 

• NIE should tnauguarate a program of research and policy 
studies designed to describe and understand educational 
institutions and programs ivhich fall outside the formal 
school system should identify and study salient policy 
issues concerning the relationship between the formal 
and informal systems. 

• NIE should use evaluation designs which provide careful 
measurement of treatments and the degree of their 
implementation and should support the development of 
such measures where appropriate. Explanatory models of 
evaluation are to be preferred. 

. NIE should give some priority in intramural and extramural 
research programs to two substantive areas: (a) aptltute- 
treatment interaction (ATI) or trai t-*treatment interaction 
(TTi) studies, and studies of the sociology of learning, 
i.e. studies of the effects of social and organizational 
factors on learning. 



Inputs and Contexts 

The economic, political, ethnic, racial, community, cultural and 
social systems In which schools and cone«;^ei^ are embedded provide 
important inputs and contexts for the Understanding of education. 



Despite the apparent finding that variations In economic resources 
have little effect on educational outcomes (Jencks» 1972, Coleman, I966, 
Spady, 1973), it is difficult to believe that there will not be a 
continuing effort to study the effect of resource allocations. The key 
to this would seem to be to move away from the conception of the school 
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as <i black box Into v/nlch resources are fed at one end, and out of which 
educational results issue from the other; the question is what is the 
money spent for, and what «re the efficiencies of various uses of 
resources? Wore recently Coleman has advanced a theoretical framework 
for studying social change In terms of the conversion of resources 
(Colee^ian, 1971)* Certainly the allocation of economic resources will 
contSnua to be an Important political and social issue as a matter of 
equity quite apart from research results or lack of results. 

The Coleman Roport and other studies have also pointed to the 
rather large and stable effects attributable to family and community 
factors, particularly socio-economic status* What Is not so generally 
recognized is that the variables used In such studies are mostly proxy 
variables; It Is difficult to Infer directly from father's occupation 
(for example) to achievement tests results. Through what processes and 
intervening variables are such effects produced? It is in this area 
that understanding must be achieved before Interventions can be designed. 
There are substantial bodies of basic research literature which can be 
focused on this problem which center on concepts and processes such as 
social izat?on (Gosl In, 1369» Inkeles, 1966 and I969> Coleman, 1972); 
Self'-concept/self-esteem (Crandal I , l973»Langenfe}d» 1972); Social 
competency (Anderson and Hessick, 1973» McClelland, 1973); Social 
stratification (Duncan , 1968) . 



The importance of the larger sociocul tural environment in 

influencing formal education and the outcomes of schooling is the major 

theoretical orientation of an Important new book Harriott and Hodgkins. 

Given such a perspective, the conclusions of many studies 
that ^'educational outcomes** are more likely a function of 
factors outside of the school than of those within it, take 
on a new meaning, for they Illustrate the more general fact 
that as '^open" social systems, educational organizations 
are continually Influenced by society. Thus, it is not 
tlmply that children within the educational system fail to 
lea rn ,but rather that what they learn is determined in large 
measure ^y the interaction of school and society. (Ttalics 
In original) (Harriott and Hodgkins, 1973» P« 15) 

Two recommendations for NIE relate to inputs and contexts: 

RECOMMENDATION: NIE should support the development and 
standardization of Input and context variables as a 
means of achieving greater understanding of the effects 
of these factors on educational experience and as an aid 
in improving the comparability and cumulativeness of 
educational research. In addition, NIE should develop 
a research agenda focused on the Influence of elements 
on the larger society on formal education. (See also the 
discussion of monitoring Indicators of soda) change 
below) • 



Outputs and Indicators 

There are several current and salient strands of thought that have 
focused attention on the need for systems level output measures. Within 
education there has been a call for greater accountabi 1 i ty in the various 
sectors of the enterprise (Stake, 1973; Levin, 1962) • Among social 
Intervention programs generally the need felt for progra m evaluation has 
stimulated considerable Intellectual ferment and a whole new ^'evaluation 
Industry" (Wholey et al., 1970; Rossi and Williams, 1972; Suchman, 1967). 
And a concern for understanding the meaning of rapid social change and 



ERLC 



U3 

planning for the future have been principle factors behind the work done 
on sociaj^ indicators (Sheldon and Moore, 1968; HEW, 1969; Land, 1972) o 
All three lines of inquiry share a focus on the need for systematic data 
basic to social policy decisions. 

While there is coAsiderable overlap In the domains encompassed by 
each of these concepts, each has a somewhat distinctive perspective or 
emphasis. The work on accountability tends to fall within the management 
framework of the operating school system. Whac data do we have to 
measure the effectiveness of our schools and school systems? (This 
concept also reaches down to the individual level In its concern for 
accountability of administrators and teachers). Program evaluation tends 
to take on the perspective of the Federal, state* or foundation program 
manager who is administering *'categortcaV* funds. Such programs cut 
across operating organizations, introducing some Incremental innovation 
In each. Those who have used the concept of social indicators have tended 
to be concerned with the operation of our institutions at the most 
macroscopic level. Thus somewhat different conceptual frameworks have 
evolved around the need to systematize policy decisions at each level of 
the System. 

The development of organizational output measures is still fairly 
prtmative* both conceptually and methodologically. The tendency is for 
each investigator to develop his own measures, and often little work is 
done to determine their validity or rel labi I i ty. The listing of compila- 
tions of measures in the bibliography include systems level measures as 
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well as Individual level measures. Also many of tihe basic facts of 
cducdttonal attainment, degrees, etc.i are collecteid on a systematic and 
comparable basis by the National Center for Educational Statistics of OE 
and by the Census Bureau, both in Its decennial census and the monthly 
currently population survey. However, there Is little agreement among 
researchers on the measurement of direct systems variables of analytic 
Interest* 

There are a number of activities of current interest in and outside 
of NIE dealing with outputs and indicators at the systems level. 

The National Center for Higher Education Hanagement Systems has been 
developing management information systems for use by colleges and 
universities. To date the work has included largely cost and other 
administrative data, but they are moving more toward the measurement of 
the benefits required by cost/benefit analysis. Some of the work on 
outputs of higher education is covered In Lawrence et. al., 1970. 

Abt Associates, the evaluation contractor for the rural schools 
within the Experimental Schools Program, Is using a sophisticated 
conceptualization of organizational change (based on general systems 
theory^organizat ional environment, Input, throughput, output, structure 
and culture-and change stages-devaluation, initiation, implementation and 
rout inizat ion) and has identified appropriate measures for its components. 

The National Assessment of Education Progress (NAEP) is an ambitious 
attempt to ascertain the knowledge, skills, understandings and attitudes 
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of young Anerlcans (Womer, 1970). Four age levels are sampled: (9-year- 
otdSp t3*year-oids, 17-year*olds » and young adults between the ages of 
26 and 35) and ten subject areas. The focus Is on the measurement of 
attainment In an absolute sense rather than with reference to some norm: 
what proportion of a given group possesses a given skill or knowledge? 
The sM^>1ing is done in a manner which does not permit reporting of 
results by school^ school district^ or state; rather results are reported 
by region^ size and type of community, sex, color, and parental education. 
This limitation seemed to be necessary in order to establish the program 
because of the sensitivities of states and school districts. However, a 
number of states have now used the model to implement state assessment 
systems (including Michigan, Halne, and Pennsylvania). 

Ail important objective of the program is to measure change over time. 
In any one year data are collected on only two subject areas, and each 
area is reassessed approntroately every five. Soms) of the items are 
repeated in each cycle, and so it is possible to determine whether the 
level of attainment of aniage group is increasing over time. The second 
cycle of data gathering has begun for some subject areas, and change data 
will soon be available. 

These data are useful for a variety of purposes other than 
descriptive monitoring^ {Deluding analyses of curriculum content. The 
NAEP staff publishes many helpful reports but does not claim to be 
exploiting the complet^^ potential of the data. This is one of a number of 
data sets which various programs in NIE could make valuable use of for 

O 
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secondary analysis purposes. Some of the methodological problems of 
secondary analysis of sample surveys are covered by Hyman, (1972). 

There Is an Important limitation to the value of the NAEP data» 
namely the paucity of data on other variables to use in Its analysis and 
Interpretation. Some limited Information Is collected on background 
variables (e.g. age» sex» region) » but no information on the nature of the 
educational programs to which respondents have been exposed. Thus NAEP 
must be classified an another example of '*biack box*' research which fails 
to include important educational variables. It is granted that there may 
be difficulties in collecting such information, given the constraints 
under which the project operates. Possibly such analyses can be performed 
on data collected in some states and local school districts which have 
patterned their assessment systems after KAEPr 

Representative Albert Quie has introduced legislation which would 
make the methodology of National Assessment the basis for a n^ajor change 
In the manner of distributing Federal funds for the disadvantaged (HR 
5163). Until now the funds for Title I of the Elementary and Secondary 
Education Act have been distributed on the basis of economic Indicators, 
used as proxies for educational disadvantagement • QuIe notes the lack of 
a perfect correlation between economic and educational measures of 
disadvantagement and proposes that the distribution should be based on 
direct educational measures. His bill would require collecting MEP type 
data on reading and mathematics on a basis which would permit reporting of 
results for each state. The individual states would, in turn, be 
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required to implement state assessment programs which would be the basis 

for allocating the funds to local districts. NIE has been discussing the 

Implications of this approach with some of the experts In the field. 

While there are many attractive features to the proposal, are a number of 

problem areas In which research would be highly doslrable before becoming 

coaimitted te a large scale national program having gr^lff^Ignlf Icance in 

the allocation of large amounts of federal support. Among the more 

Important are the following (Hadaus and Elmore, 1973): 

« What would be the effects of the negative incentive 
feature of the bill and how would they compare with 
a positive incentive system? 

• What Is known about the effects of other external 
testing programs, particularly the problem of 
'^teaching to the test" and whether such programs 
have the effect of Inhibiting Innovation and 
homogenizing the educational program? 

• In the several states that have adopted state 
assessm(^)t systems, what has been the effect of 
such systems, especially In Michigan where the 
system Is used to allocate resources? 

• How can NAEP type data be aggregated and summarized* 
and what are the methodological problems involved 

in setting performance standards? 



The social Indicators movement has bad a short and checkered hlstroy 
(Brooks, 1972). While there Is considerable disagreement on the meaning 
of the term, there seems to be consensus on several elements: X^) social 
indicators are time series data which permit the monitoring of change over 
extended periods of time permitting the separation of long term trends from 
short term fluctuations; (b) they may be either quantitative or qualitative; 
and (c) they can be disaggregated by relevant attributes of either the 



persons or the conditions measured (such as skin color or year of 
construction) and by the contextual characteristics that surround the 
measure (such as region or city size) (Sheldon and Freemant 1970). Among 
the early hopes were that a syst^em 6f social accounts could be developed 
comparable to the system of economic accounts, and that the Indfcators 
would be directly useful for program evaluation and the setting of goals 
and priorities (National Commission on Technology, Automation and 
Economic Progress, 1966; HEW, 1969) • Senator Mondale has Introduced 
legislation which would establish a Council of Social Advlsori^ responsible 
for preparing an Annual Social Report to the President- 
More recently some of these early statements of expectations have 

1 

been criticized as unsound and unrealistic (Sheldon and Freeman, 1970; 
Sheldon and Land, 1972). The social area lacks a comnon metric and a 
model of the social system from which to derive a system of social 
accounts. Social indicators are the product of multiple causes, and the 
effects of specific government programs cannot be disentangled from other 
causes. The setting of goals and priorities ultimately depend on value 
choices not the assembling of data. 

Nevertheless there seems to be agreement that the concept is still 
useful in relation to the key function of monitoring social change, both 
in its objective dimensions (Sheldon and Moore, 1968) and subjective 
dimensions (Campbell and Converse, 1972). It is also helpful in pointing 
to the need for standardization of measures In the social field. 
Furthermore, we are beginning to see the development of models of systems 

ERLC 



k9 

or sub-systems which ;^rovide some understand lis ,;> of causal networks (Land, 
1972; Anderson, 1973). 'he full potential of th social indicators 
concept will not be reache^^ until the indicators be integrated Into 
expli uitory modesl and theori s; but this advance in t n may be dependent 
on the ^velopment and Improvemti' t of appropriate m^asur^ 

Current-' the Office of Manageme^ and Budget is circulath ; a draft 
soc^^l Indicators report which would be issued on a periodic basis. The 
educatioc: i^ectloi; of the report consists ov Census and OE data on 
enrollment, rcP^ntlc, graduates, and degrees ! ?ys some of the National 
Assessment resu 1 t& : 

The National Science ^iv^odation sponsors a program o- research on 
social Indicators. Three projti:<ts or activities of special ;^ii?.re5Jt to 
NIE are: (1) development of a framework for national goals accov'^ing 
(National Planning Association, 1972); (2) support for the Social M?>nce 
Research Council's Center for Coordination of Research on Social Indicators 
(which among other things is seeking to standardize the wording of a 
number of "face sheet" items frequently used In sample surveys); and (3) 
several projects to develop uniform measures of social competence. 

The importance of non->educational indicators for NIE lies in the fact 
that many of the major changes In education have come about In reaction to 
forces originating outside of the educational sector, such as Supreme 
Court decisions, the "baby boom,-' Sputnik, the war on poverty, the 
movement for community control, concern with youth unemployment, and the 
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movement of women Into the labor force (c. Winiams. 1973). Also, in a 
rapidly changing society the schools of today need to anticipate the 
nature of the society which tomorrow's graduates will be entering. 
Programs are needed within NIE which focus on the interface between 
education and other key sectors and seek to prepare stude/its for 
tomorrow *s world. 

Educational Indicators are, of course, a type of social Indicator. 
The educational field Is relatively rich in the number of statistical 
time series available through the National Center for Education Statistics 
(NCES) of the Office of Education, the Census Bureau and various state and 
local educational agencies. However^ the Indicators available vary 
considerably in their useful liness for either theory or practice. For 
example* while there is considerable information about inputs and of 
gross outputs like graduates and educational attainment, there is little 
specific information about educational practice or on the knowledge, 
attitudes and behaviors of pupils and students. 

Granted this problem, there are more tine series available than are being 
properly exploited. Important data are often available at state and local 
levels when not available nationally. Fortunately the situation is 
beginning to change^ and a few efforts can be cited which show how such 
data might be used and how they need t:) be Improved. Abbott L. Ferriss 
has been one of the leading workers in this particular vineyard. In 1969 
he published Indicators of Treads in American Education In which he 
reviewed a large number of the time series and identified significant 

O 
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trends that v^erc observable. He has urged the usefulness of such data in 
serving a monltorshlp function (Ferrlss^ 1972)* Monitoring consists of 
determing whether new observations represent the continuation of past 
trends or whether they signal a turning point, if the latter, the task Is 
to determine whether the change has significant consequences for the 
future^ particularly for other normatlvely significant elements in the 
system. Clearly this kind of function is essential to any policy analysis 
activity in NIE. 



Ferriss has suggested that there are at least four types of 

educational indicators that would be highly useful for monltorshlp, 

providing clues to Intervention: 

. Measures of the educational status of the population, 
primarily the out-of*school population; for example, 
ideally this would be an inventory of the skills In 
the population; practically as a minimum we now can 
determine the following: years of school completed 
(by various traits, such as age, sex, color, etc.), 
percent of the population with various degrees, by 
field, percent of the population certificated at 
given levels of competence by various professions, etc. 

. Educational progress of the school population: 
continuation ratios by age, ser:, color, etc.; grade 
progression; dropout rates; completion rates, etc. 

. Qualitative Information on the staff of educational 
Institutions. 

. Measures of characteristics of the school. Characteristics 
chosen should possess demonstrated relationships to 
educational outcomes, that is, that are dicated by 
explanatory models and theories. (Ferriss, personal 
communication) 
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NCES itas been concerned with ratfonaUzfng its statistical system and 
has commissioned a number of papers by Salma Hushkin of Georgetown 
University to explore the problem of output measures in education (Mushkin 
1971; 1972a; 1973K A recent product has been Indicators of E ducational 
Outcome Fall 1972 (Cobern, Salem and Mushkin, 1973), which Includes a 
classification of outputs of potential value (see Table). 



Table A. — Summary Classification of Outputs* 
With Selected Examples 



Time Phase 1 (Primary Effects) 



Product Consumption 



Investment 



Quantity 



Qua! i ty 



Income 



Employment 



Number of 
students. 
High School 
completions. 



Attitudes, 
Attributes, 
Aptitudes, 
Achievements 
(e.g., self- 
esteem, crea 



Value added. 
Earnings, 
Added earnings. 



School dropputs. 
Unemployment 

rate, 

etc. 



etc. 



etc. 



tivity, IQ, 
SAT scores) 
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Timfe Phase 2 (Secondary Effects) 

I nvas tmen t Consump I i on 

Feedback Feedback 



Consumer Information, 
Consumer efficiency, 
Medical care use, 
Use of leisure time, 
Moral and citizen- 
ship values, 
etc. 



Economic 

growth (e.g. 
Years of 
school ing, 
lifetime 
earnings dif* 
ferentials) 



Time Phase J (Tertiary Effects) 

Intergeneratlonal Impacts 

Educational motivation of 
chi Idren 

* In addition to benefits to students, there are benefits v;o parents 
such as the babysitting cr child care activities of th^ school. 

Source: Cobern, Salem and Mushkin, 1973» P« 7« 

NCES Is also th^ sponsor of the National Assessment: of Educational 
Progress^ discussed earlier, which will provide useful education indicators 
once the cycle of data gathering starts to produce time series results. 
Some agreed-upon way of computing summary scores is also needed. 

The Office of Education has been sponsoring a program of monitoring 
social trends at the Educational Policy Research Center at the Stanford 
Research Institute (SRO • This work is based on a ''future research*' 
framework (William, C. 1973)* NIE needs to develop some formal ties to 
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RECOMHENOATIONS: NIE should organize a small staff in the 
Planning and Policy Analysis Unit to monitor social and 
educational change through such activities as: 

• Analysis of educational and social indicators 
published by other agencies 

• Conduct and support for projects building 
explanatory models of the educational system 
and the larger social system in which it is 
embedded 

• Identification and refinement of measures or 
variables needed in the models 

• Liaison with organizations collecting indicator 
data, with the OHB social indicator unit, the 
SRI Center, and other relevant organizations 

• Support for special extramural studies of the 
impact of OMtside forces on education 

• Serve as an information resource for the 
National Council on Educational Research 

All groups In NIE need to be as sensiiive to the neca fur 
';yste.T.3 measures as to inucvidual measures for the 
undcrstahdSnn of programs, processes. Inputs, cont:exts, 
outputs and indicators* Recommendations made tr the 
previous section of the paper regarding the ac-'ivities 
of the agency-wide Task Force on Measurement find the 
Exploratory Studies Group on Measurement, Hct/iodotogy 
and Secondary Analysis should be expanded ^.o (encompass 
the need to improve our measurement of sy terns. 

Systems Effects of Testing 

It is not easy to separate the issues surrounding th^ effects of 
testing into individual and system effects f!;ince many individual effects 
have system consequences when aggregated. For this reason many of the 
points made ^n the earlier section on individual effects of testing and the 
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probleo of bias arc relevant here as wel!. Nevertheless it will be useful 
to refocus our attention on the probleini from the systems perspective. 

One of the features of social change In the past ten y>^ars has been 
the decline of the melting pot philosophy and the growth of cultural 
pluralism. Some years ago Florence Kluckhohn pointed out that not all 
departures from dominant culture patterns ire deviant, i.e* "bad" 
(Kluckhohn, 1953). Any society, particularly one as complex and 
heterogeneous as ours legitimizes departures from the most common modes of 
behavior for certain groups and roles under certain circumstances. Thus 
we have both dcvninant and variant culture patterns which are viewed as 
legitimate. We have been witnessing the proliferation of variant culture 
patterns In the Uilted States during the past decade. 

Problems arise when the construction, use, or interpretation of tests 
or other measures is not anchored in an appropriate cultural frame of 
reference (ETS, 1973). Standardized tests which have been normed on 
white middle class populations might be quite invalid if used to assess 
the general ability of a lower class black; yet if we shift the frame of 
reference it might be quite accurate in reflecting the assimilation of the 
lower class black into the dominant culture. By the same token, the 
"BITCH Teat* (Black Intelligence Test of Cultural Homogene i ty- Educa 1 1 on 
Dai ly) may be an accurate reflection of Intelligence of those raised 
within a particular ghetto sub-culture, It would be useless for either 
blacks or whites in relation to any activities outside of the sub-cuUure. 
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So what do we mean by cultural pluralism or variant cultures? Soma 
advocates of ^^bll ingual^bl cul turaV* education speak as If we have or 
would like to achieve multiple parallel societies such as those found In 
Quebec or Belgium. Certainly this 1$ Implied when they advocate school 
prograrro in which a full curriculum In Spanish ts offered K*12 In parallel 
with an English curriculum for all pupil;^. However, such a parity Is not 
now reflected in occupational and other spheres of our society and Is not 
likely to be In the forseeable future. Indeed, one suspects that the 
chief goal of most minority group parents, whatever their pride in their 
own ethnic heritage, Is for their children to become full members of the 
majority society, at least In their occupational roles. The point Is to 
recognize that there Is no inconsistency between the parallel existence 
of dominant and variant cultures so long as one can sort out which is 
appropriate (n various times and circumstances. 

Questions have been raised concerning the use of tests and other 
assessment procedures to serve gatekeeping functions In the stratification 
system: sorting children into different tracks or curricula within the 
school system; selecting those to be admitted to college; and selecting 
those for admission to or placement within the occupational world. Some 
critics feel that the system is too decisive at various points and argue 
for keeping options open for longer periods. Furthermore, the selection 
process is difficult to defend when evidence is often lacking that the 
criteria used to sort and select have direct relevance to later 
occupational success and may often mislabel young people on the 
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probability of educational success. 



While the arguments against educational selection often seem 

compel Itng, we need to proceed with caution. 

... we should not overlook two possibilities: 
that our schools and colleges generally may be 
more merl tocrat lc**use more universal standards 
for advancemenfthan the world of work; and that 
loosening the meritocratic or al locative function 
of education may create more Inequality of 
opportunity than presently exists, leaving the 
most important educational decisions (e«g., who 
goes to college and where) to fall once again 
upon the family, social heredity, or politics. 
If indeed our economic system arbitrarily 
discriminates against racial, sex, and other 
^^mlnori ties^^ to the extent that some observers 
have Indicated, one could argue for more rather 
than less universal 1st Ic standards In educational 
selection and a closer rather than a looser fit 
between educational attainment and occupational 
placement. At least we si^uld proceed cautiously 
in condemning our schools and colleges for s<^ttlng 
standards which not everyone is expected to achieve. 
Unlike the world of work where the norms of ' 
achievement are frequently and perhaps necessarily 
evaded (e.g., in Job rights and seniority) • schools 
may be the more Important arena for "letting the best 
man win." (Clark, 1970 



We have already alluded to the possible effects of external testing 
programs in discouraging innovation or departure from a dominant core of 
content. The National Assessment of Educational Progress, it should be 
noted, employs an elaborate process of Identifying consensus objectives 
on which to base their exercises. It would be important to determine 
whether such a methodology has a rigldlfying effect on school programs, 
either in connection with NAEP iltself, or the use of comparable assessment 
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systems at state or local levels. 
RECOMMENDATION: 

An Exploratory Studies group should undertake a 
program of research on the effects of testing and 
other assessment methods which would study such 
problems as: 

• How does the selection and channeling process 
now ope.rate in schools and how can it be 
improved? What is its effect on different 
cultural sub-groups in the popultion? Do tests 
foster z narrow conception of ability and reduce 
the diversity of talent available to schools and 
society? 

. What effect does testing have on the diversity 
and Innovat iveness of school programs? Do new 
technologies like the use of item banks and does 
computer testing provide solutions to problems 
posed by older methods? 



This research program should not be conductec in isolation from other NIE 
activities, but rather should work through the agency-wide Task Force on 
Measurement and "piggyback'^ on other programs, such as those dealing with 
bilingual education, education for the urban disadvantaged, and the 
evaluation of experimental sbhools, wherever possible and appropriate. 
Some of these issues will be studied using a unique experimental design in 
the Boston College project examining the introducting of testing in 
Ireland. 



Theoretical and Methodological Issues 

As with measurement of individuals, we will eschew a detailed 
treatment of theoretical and methodological issues concerning the 
measurement of systems. Insteao we will content ourselves wi th noting 
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some of the different types of measures encountered at the systems level 
and citing a number of recent papers which discuss some of the principal 
me thodol og i ca 1 i ssues • 

A rou<jh categorization of types of measures would Include (a) 
aggregated data, or characteristics measured by summing data from 
individuals or lower order systems; (b) context data, or data 
characterizing higher order systems; (c) direct systems measures, or 
characteristics which are not derivative of either lower or higher order 
systems; and (d) derived measures, or measures such as ratios which 
represent relationships between other variables. Frequently It happens 
that the investigator concerned with one level of analysis Is forced to 
adjust data obtained at a different level of analysis. When this happens, 
serious methodological problems can be encountered (Herriott and Huse, 
1973). 

Coleman has made a number of contributions: a survey of 
methodological problems in sociological analysis including those 
encountered when trying to use social indicators for policy analysis 
(1969); an explication of the methodological foundations of policy research 
in the social sciences (1972a); and problems in using standardized tests 
to evaluate school performance (Coleman ^nd Karwelt, 1970). Rlgsby and 
McDill have examined the conceptualization and measurement of adolescent 
peer influence processes (1972) « Finally, Riley has reviewed a number of 
issues concerning the sources and types of sociological data (1964). 
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An NIE Strategy for Program Development 

The measurement problem area ts unique tn that it cuts across all 
other problem areas» yet also stands apart as a discipline or 
sub-discipline in its own right with its own theory and methodototiy. Thus 
NIE faces the dilemma of choosing a centralized or de-cent ra 1 i zed strategy 
in mounting initiatives to deal with the problems outlined in previous 
sections of this paper. 

As has already been anticipated in earlier recommendations, a mixed 
strategy is advised, coinciding with the recommendations of the 1972 
conference (Kooi , 1972). A completely decentralized approach is not 
desirable because 'investigators working on substantive problems 
concentrate on those problems as such. They tend to employ current 
methods, even methods with known limitations, rather than turn aside to 
confront and resolve the methodological difficulties they meet'* (Fiske, 
1972). Furthermore, the use of common measures and common methodology 
among problem areas can be a powerful force toward reducing the 
fragmentation of education research and promoting the culumation of 
research knowledge. On the other hand» a completely centralized approach 
is not desirable either, isolated measures have little meaning. They 
take on meaning as they are used to develop theories and models and to 
solve problems. This is the only way to estalblsh the construct validity 
of measures. 
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Consequently, we recommend a mixed strategy in which certain 
functions and responsibilities are assigned to the various substantive 
programs within NIE while others are allocated to a central unit, and 
the two are tied together through form of matrix management. 

Dece ntralized Functions 

Each NIE program should include a compliment of measurement 
specialists. This group will oftt;n coincide with or overlap with those 
charged with evaluation functions within the program. They should be 
drawn not only from the tests and measurements field of educational 
psychology, but also from among i^asurement specialties in sociology, 
economics, and other disciplines. 

Some measures tend to be unique to a problem (e.g. special 
instruments for bilingual populations) while others are common to many 
problems (e.g. turnover of personnel). While the Task Force on 
Heasuremeiit will attempt to identify and coordinate work on common 
measures, much of the work of instrument development, refinement, and 
validatiori must take place in the context of substontive research 
programs where their usefulness in theoretical models can be determined. 

Theory and methodology of medsure/,i)ent can be handled best through a 
combination of intramural research and some targeting of field initiated 
research in the Office of Research Grants, if it is agreed that work in 
the measurement field should be an NIE priority and that we ^ish to 
stimulate an acceleration of work in the field, would be highly desirable 
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to identify a special Panel on Educational Measurement with Its own funds. 
The Study Group working with this panel should work with the field to 
stimulate the flow of high quality proposals to the grant program. 

Centralized Functions 

One of the task forces within the Office of Research and Exploratory 
Studies should be made up of measurement specialists (possibly combined 
with concerns for methodology and secondary analysis, as seems to be the 
plan). This group should develop Its own program of Intramural and 
extramural research, concentrating on those problems that either cut 
across other programs or are not covered by other programs. These would 
include research on the effects of tasting and other forms of measurement 
on individuals and systems, and work on new technologies such as Item 
banks or computer testing. This staff should Also serve as a resource 
for other programs In NIE when special ne^ds arise. They would be the 
first group to whom ^he Director and Council would look when problems or 
inquiries regarding measurement arise. They would handle contacts and 
control correspondence with outside individuals making inquiries about 
measurement programs In NIE (with referral to more specific programs as 
appropriate). 

We have also recommended that the NIE Library should have a 
measurement information specialist on Its staff to assist NIE researchers 
in locating instruments, data banks, and the specialized literature of the 
measurement field. 
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Matrix Management 

While it is possible to undertake most If not ali of the work 
recommended in this paper through allocation of responsibilities In 
either the centralized or de*centra1 ized mode, there remains a need to 
coordinate this woric \t\ order to maximise the synergistic effect. The 
fragmentation of effort has been one of the curses of educational research, 
and NIE needs to take special steps to avoid it. It is therefore 
proposed that a form of matrix management be utilized by forming an 
agency-wide Task Force on Measurement. This Task Force would be chaired 
by the director of the Task Force on Measurement, Methodology and ' 
Secondary Analysis in ORES and would include representation from the 
Study Group on Objectives, Measurement, and Evaluation of ORG, the 
Planning and Policy Analysis Unit of ORDR, the Educational Reference 
Division of OA, and measurement specialists in the line research units. 
This group should serve to coordinate work involving measurement in the 
several organizational units, promote the use of common measures where 
appropriate, cumulate and codify new knowledge as it emerges, develop 
standards for technical review of proposals, RFP's and products, and 
generally continue to build and refine an agency-wide strategy for the 
improvement of measurement for the various clients and purposes 
identified earlier. The effectiveness of such a group will be 
considerably enhanced if it has some funds at its disposal with which to 
support intramural research activities of its members. 
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Conclusion 

This paper has covered a Very diverse range of topics in a very 
broad field. Admittedly no one topic has been covered in the depth et 
deserves. However, a major purpose of the paper will have been served !f 
th^ reader has gained a new conception of the range and complexity of the 
measurement field* 

This work was originally undertaken because a number of reports 
prepared for the Planning Unit which preceded the establishment of NIE had 
recommended the development of instruments to measure a broader range of 
pupil outcomes. While the measurement of basic cognitive abilities is 
relatively well advanced, we do not have accurate and credible measures 
of other kinds of pupil performance that many consider important 
objectives of education, including problem-solving ability, moral values, 
social maturity, skill in Interpersonal relationships, and other affective 
and higher order cognitive abilities. 

While agreeing with the need for new pupil outcome measures, we 
have attempted to show that NIE should extend the range of Its concern 
with measurement along a number of other dimensions as well, 

(1) Our ability to measure characteristics of Individuals Is farther 
advanced than our ability to measure systems. Understanding the operations 
of systems is important both In its own right and In the contribution It 
can make to understanding individual growth and change. 
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(2) Similarly, psychometrics is a better developed field than the 
measurement sub-'dtscipl ines of sociology, political science, and other 
disciplines. As an tnter*d!scip1 inary problem area, educational Pv.D needs 
to Include measurement research In all these fields, and NIE naecs 
measurement specialists from each of them on its staff. 

(3) Standardized tests represent only one way of collecting 
educational data. Support needs to be given to improvement of other data 
collection methods, including observation, questionnaires, interviews, 
administrative records^ financial accounts, and other unobtrusive 
measures. 

(k) The measurement needs of the research and development community 
are not coterminous with those of operating school systems. As an R$D 
agency NIE imist contribute to the solution of measurement problems faced 
by researchers, developers, evaluators, and change agents as well as those 
of practitioners. 

(5) While it is important to measure outcomes of education that 
correspond to explictly stated objectives, it 5s also important to detect 
and measure the unplanned and unintended consequences of educational 
programs, 

(6) It is not enough. to measure the outcomes of education at the 
individual or systems level. Research designs that treat schools as a 
^'blaclc box' ' are not likely to be useful. Our understanding of educate ion 
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and the ability to devise solutions to probtems depend on our ability to 
identify and measure tnpMvts» contexts and processes related to those 
outcomes. Further, me^isures and the variables they represent cannot be 
neatly clas^'ified by analytic function; the same dimension mioht be an 
Input, an output, or a context depending on the problem and the design. 

(7) Above all, the importance of theory in deciding whht ought to 
be measured needs to be recognized. It is not enouch that technically 
correct instrument development techniques are used; there is a serious 
need to know more about what our Instruments are measuring. A rkajor 
effort should go Into establishing the construct validity of measures. 
Wherever possible measures should be identified as part of larger systems 
of variables, theories, or models which seek to establish causal 
relationships. 
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